Search CORE

21 research outputs found

From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition

Author: Javed Syed Ashar
Yoo Donggeun
Zheng Heliang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

As deep learning approaches to scene recognition emerge, they have continued to leverage discriminative regions at multiple scales, building on practices established by conventional image classification research. However, approaches remain largely generic, and do not carefully consider the special properties of scenes. In this paper, inspired by the intuitive differences between scenes and objects, we propose Adi-Red, an adaptive approach to discriminative region discovery for scene recognition. Adi-Red uses a CNN classifier, which was pre-trained using only image-level scene labels, to discover discriminative image regions directly. These regions are then used as a source of features to perform scene recognition. The use of the CNN classifier makes it possible to adapt the number of discriminative regions per image using a simple, yet elegant, threshold, at relatively low computational cost. Experimental results on the scene recognition benchmark dataset SUN397 demonstrate the ability of Adi-Red to outperform the state of the art. Additional experimental analysis on the Places dataset reveals the advantages of Adi-Red, and highlight how they are specific to scenes. We attribute the effectiveness of Adi-Red to the ability of adaptive region discovery to avoid introducing noise, while also not missing out on important information.Comment: To appear at the ACM International Conference on Multimedia (ACM MM 2018). Code available at https://github.com/ZhengyuZhao/Adi-Red-Scen

arXiv.org e-Print Archive

Crossref

Radboud Repository

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

Author: Li Bin
Li Ziqiang
Wang Chaoyue
Zhang Jing
Zheng Heliang
Publication venue
Publication date: 18/07/2022
Field of study

Data-Efficient GANs (DE-GANs), which aim to learn generative models with a limited amount of training data, encounter several challenges for generating high-quality samples. Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot. Recently, contrastive learning has shown the great potential of increasing the synthesis quality of DE-GANs, yet related principles are not well explored. In this paper, we revisit and compare different contrastive learning strategies in DE-GANs, and identify (i) the current bottleneck of generative performance is the discontinuity of latent space; (ii) compared to other contrastive learning strategies, Instance-perturbation works towards latent space continuity, which brings the major improvement to DE-GANs. Based on these observations, we propose FakeCLR, which only applies contrastive learning on perturbed fake samples, and devises three related training techniques: Noise-related Latent Augmentation, Diversity-aware Queue, and Forgetting Factor of Queue. Our experimental results manifest the new state of the arts on both few-shot generation and limited-data generation. On multiple datasets, FakeCLR acquires more than 15% FID improvement compared to existing DE-GANs. Code is available at https://github.com/iceli1007/FakeCLR.Comment: Accepted by ECCV202

arXiv.org e-Print Archive

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

Author: Lan Long
Wang Chaoyue
Yang Wenjing
Zhao Jing
Zheng Heliang
Publication venue
Publication date: 25/03/2023
Field of study

The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications. Project page is available at https://magicfusion.github.io/

arXiv.org e-Print Archive

Unified Discrete Diffusion for Simultaneous Vision-Language Generation

Author: Cham Tat-Jen
Hu Minghui
Suganthan Ponnuthurai N.
Tao Dacheng
Wang Chaoyue
Yang Zuopeng
Zheng Chuanxia
Zheng Heliang
Publication venue
Publication date: 27/11/2022
Field of study

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks

arXiv.org e-Print Archive

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

Author: Han Mengya
Hu Han
Luo Yong
Wang Chaoyue
Wen Yonggang
Zhang Jing
Zheng Heliang
Publication venue
Publication date: 24/08/2023
Field of study

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It is found that leveraging the textual space of a powerful pre-trained image-language model (such as CLIP) can be beneficial in learning visual features. Therefore, we develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning. Specifically, we design a part-aware prompt learning method to generate part-specific prompts that enable the CLIP model to better understand the concept of ``part'' and fully utilize its textual space. Furthermore, since the concept of the same part under different object categories is general, we establish relationships between these parts during the prompt learning process. We conduct extensive experiments on the PartImageNet and Pascal

\_

Part datasets, and the experimental results demonstrated that our proposed method achieves state-of-the-art performance

arXiv.org e-Print Archive

Domain Re-Modulation for Few-Shot Generative Domain Adaptation

Author: Li Bin
Li Ziqiang
Tao Dacheng
Wang Chaoyue
Wu Yi
Zhao Shanshan
Zheng Heliang
Publication venue
Publication date: 23/05/2023
Field of study

In this study, we delve into the task of few-shot Generative Domain Adaptation (GDA), which involves transferring a pre-trained generator from one domain to a new domain using only a few reference images. Inspired by the way human brains acquire knowledge in new domains, we present an innovative generator structure called Domain Re-Modulation (DoRM). DoRM not only meets the criteria of high quality, large synthesis diversity, and cross-domain consistency, which were achieved by previous research in GDA, but also incorporates memory and domain association, akin to how human brains operate. Specifically, DoRM freezes the source generator and introduces new mapping and affine modules (M&A modules) to capture the attributes of the target domain during GDA. This process resembles the formation of new synapses in human brains. Consequently, a linearly combinable domain shift occurs in the style space. By incorporating multiple new M&A modules, the generator gains the capability to perform high-fidelity multi-domain and hybrid-domain generation. Moreover, to maintain cross-domain consistency more effectively, we introduce a similarity-based structure loss. This loss aligns the auto-correlation map of the target image with its corresponding auto-correlation map of the source image during training. Through extensive experiments, we demonstrate the superior performance of our DoRM and similarity-based structure loss in few-shot GDA, both quantitatively and qualitatively. The code will be available at https://github.com/wuyi2020/DoRM.Comment: Under Revie

arXiv.org e-Print Archive

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

Author: Bian Rongcheng
Cai Bohua
Cao Qiong
Chen Guanpu
Chen Shixiang
Ding Liang
He Fengxiang
Li Chang
Li Jiaxing
Liu Daqing
Liu Dongkai
Liu Wei
Liu Xiangyang
Peng Xuyang
Shen Li
Tao Dacheng
Wang Chaoyue
Wang Zhenfang
Xie Shuai
Xue Chao
Yang Yibo
Zhan Yibing
Zhang Jing
Zhang Shijin
Zhang Yukang
Zhao Shanshan
Zhao Yiyan
Zheng Heliang
Publication venue
Publication date: 08/07/2023
Field of study

Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce

arXiv.org e-Print Archive

Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition

Author: Heliang Zheng
Jianlong Fu
Jiebo Luo
Tao Mei
Zheng-Jun Zha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Crossref

Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

Author: Huang Wanrong
Lan Long
Wang Chaoyue
Yang Wenjing
Zhao Jing
Zheng Heliang
Publication venue
Publication date: 11/05/2023
Field of study

Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance. Specifically, we proposed two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as \textbf{null-text noisy image} and \textbf{text noisy image} respectively) in the sampling process. Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing

x_t

with

x_{t+\Delta t}

. Image-D, alternatively, produces high-fidelity, diverse cartoons by defining

x_t

as a clean input image, which further improves the incorporation of finer image details. Through comprehensive experiments, we delved into the principle of noise disturbing for null-text and uncovered that the efficacy of disturbance depends on the correlation between the null-text noisy image and the source image. Moreover, our proposed techniques, which can generate cartoon images and cartoonize specific ones, are training-free and easily integrated as a plug-and-play component in any classifier-free guided diffusion model. Project page is available at \url{https://nulltextforcartoon.github.io/}

arXiv.org e-Print Archive